Compressed Data Structures for Dynamic Sequences

نویسندگان

  • J. Ian Munro
  • Yakov Nekrich
چکیده

We consider the problem of storing a dynamic string S over an alphabetΣ = { 1, . . . , σ } in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: access(i, S) returns the i-th symbol in S, ranka(i, S) counts how many times a symbol a occurs among the first i positions in S, and selecta(i, S) finds the position where a symbol a occurs for the i-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only nHk+o(n log σ) bits, where Hk is the k-th order entropy and n is the string length. Moreover our representation supports extraction of a substring S[i..i + ] in optimal O(log n/ log log n+ / logσ n) time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Space-efficient Data Structures for Collections of Textual Data

This thesis focuses on the design of succinct and compressed data structures for collections of string-based data, specifically sequences of semi-structured documents in textual format, sets of strings, and sequences of strings. The study of such collections is motivated by a large number of applications both in theory and practice. For textual semi-structured data, we introduce the concept of ...

متن کامل

A Framework of Dynamic Data Structures for String Processing

In this paper we present DYNAMIC, an open-source C++ library implementing dynamic compressed data structures for string manipulation. Our framework includes useful tools such as searchable partial sums, succinct/gap-encoded bitvectors, and entropy/run-length compressed strings and FM-indexes. We prove close-to-optimal theoretical bounds for the resources used by our structures, and show that ou...

متن کامل

Practical aspects of Compressed Suffix Arrays and FM-Index in Searching DNA Sequences

Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compress...

متن کامل

Grammar Compressed Sequences

Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. Several recent applications need to represent highly repetitive sequences, and classical statistical compression proves ineffective. We introduce, instead, grammar-based representations for repetitive sequences, which ...

متن کامل

Comparison of Seismic Behavior of Buckling-restrained Braces and Yielding Brace System in Irregular and Regular Steel Frames under Mainshock and Mainshock-Aftershock

Due to low stiffness of braces after yielding, the structures with buckling-restrained braces (BRBs) experience high residual drifts during an earthquake, which can be intensified by aftershocks and causes considerable damages to structures. Also, due to poor distribution of stiffness, this problem is exacerbated for irregular structures. Recently, the yielding brace system (YBS) has been intro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015